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ABSTRACT 

This  paper  considers  the  problem  of  blind  active  spread- 
spectrum  (SS)  steganalysis  defined  as  the  extraction  of  hid¬ 
den  data  with  no  prior  information.  We  first  develop  a  multi¬ 
signature  iterative  generalized  least-squares  (M-IGLS)  core 
procedure  to  seek  unknown  messages  hidden  in  image  hosts 
via  multi-signature  direct-sequence  spread-spectrum  embed¬ 
ding.  Neither  the  original  host  nor  the  embedding  signatures 
are  assumed  available.  Then,  cross-correlation  enhanced  M- 
IGLS  (CC- M-IGLS),  a  procedure  described  herein  in  detail 
that  is  based  on  statistical  analysis  of  repeated  independent 
M-IGSL  processing  of  the  host,  is  seen  to  offer  most  effec¬ 
tive  hidden  message  recovery.  In  fact,  experimental  studies 
show  that  the  proposed  CC-M-IGLS  active  SS  steganaly¬ 
sis  algorithm  can  achieve  probability  of  error  close  to  what 
may  be  attained  with  known  embedding  signatures  and  host 
autocorrelation  matrix. 
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1.  INTRODUCTION 

Steganography,  which  literally  means  “covered  writing”  in 
Greek,  is  the  process  of  hiding  data  under  a  cover  medium 
(also  referred  to  as  host),  such  as  image,  video,  or  audio  [1[- 
[3].  The  basic  purpose  of  steganography  is  to  establish  covert 
communication  between  trusting  parties.  While  other  data 
hiding  applications  (such  as  watermarking  [4]- [6])  have  their 
own  individual  requirements,  the  broad  common  objective 
of  most  steganographic  applications  is  a  satisfactory  trade¬ 
off  between  hidden  data  resistance  to  noise/disturbance  (ro¬ 
bustness)  ,  information  delivery  rate  (payload) ,  and  low  host 
distortion  for  concealment  purposes. 

Steganalysis,  which  is  the  countermeasure  technology  to 
steganography,  aims  to  discover  the  presence  and/or  extract 
the  content  of  the  secret  data.  Accordingly,  steganalysis  can 
be  classified  into  two  categories  [7],  passive  and  active.  The 
primary  task  of  passive  steganalysis  is  to  decide  the  pres¬ 
ence  or  absence  of  hidden  messages  in  given  media  objects. 

In  contrast,  active  steganalysis  refers  to  the  effort  of  extract¬ 
ing  the  actual  hidden  data1.  While  passive  steganalysis  is 
being  intensively  investigated  in  the  past  few  years  [9]- [17], 
active  steganalysis  is  a  relatively  new  branch  of  research.  To 
our  best  knowledge,  there  seems  to  have  been  little  attempt 
in  developing  active  steganalysis  methods  that  can  blindly 
extract  the  secret  data. 

In  this  work,  we  focus  our  attention  on  active  spread-spectrum 
(SS)  steganalysis.  In  particular,  we  aim  to  recover  blindly  se¬ 
cret  data  hidden  in  image  hosts  via  (multi-signature)  direct- 
sequence  SS  embedding  [18]- [25] .  Neither  the  original  host 
nor  the  embedding  signatures  (spreading  sequences)  are  known 
(fully  blind  SS  steganalysis).  In  blind  active  SS  steganalysis 
the  unknown  host  acts  as  a  source  of  interference/disturbance 
to  the  data  to  be  extracted  and,  in  a  way,  the  problem  paral¬ 
lels  blind  signal  separation  (BSS)  applications  as  they  arise 

1  In  another  interpretation  of  active  steganalysis,  the  stegan- 
alyst  manipulates  the  embedded  data,  such  as  introducing 
noise,  in  hopes  of  destroying  the  secret  message  (if  any)  [8]. 
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in  the  fields  of  array  processing,  biomedical  signal  process¬ 
ing,  and  code-division  multiple-access  (CDMA)  communica¬ 
tion  systems.  Under  the  assumption  that  the  embedded  se¬ 
cret  messages  are  independent  identically  distributed  (i.i.d.) 
random  sequences  and  independent  to  the  cover  host,  in¬ 
dependent  component  analysis  (ICA)  -one  particular  family 
of  BSS  methods-  may  be  utilized  to  approach  the  hidden 
data  extraction  problem  [7], [26].  However,  ICA-based  BBS 
algorithms  degrade  rapidly  in  the  presence  of  correlated  sig¬ 
nal  interference  as  is  the  case  in  SS  image  embedding.  In 
[27],  Gkizeli  et  al.  developed  an  iterative  generalized  least 
squares  (IGLS)  procedure  to  blindly  recover  unknown  mes¬ 
sages  hidden  in  image  hosts  via  SS  embedding.  The  algo¬ 
rithm  has  low  complexity  and  remarkably  good  recovery  per¬ 
formance.  However,  the  scheme  is  designed  solely  for  single¬ 
signature  SS  embedding  where  messages  are  hidden  with 
one  signature  only.  Realistically,  a  steganographer  would 
favor  multi- signature  SS  embedding  to  increase  security  and 
payload  rate.  The  work  in  [27]  is  not  generalizable  to  the 
multi- signature  case. 

In  this  paper,  we  develop  a  new  multi-signature  iterative 
generalized  least  squares  (M-IGLS)  SS  steganalysis  algo¬ 
rithm  for  hidden  data  extraction.  For  improved  recovery 
performance,  in  particular  for  small  hidden  messages  that 
pose  the  greatest  challenge,  we  propose  an  algorithmic  up¬ 
grade  referred  to  as  cross-correlation  enhanced  M-IGLS  (CC- 
M-IGLS).  CC- M-IGLS  relies  on  statistical  analysis  of  inde¬ 
pendent  M-IGLS  executions  on  the  host  and  experimental 
studies  indicate  that  can  achieve  hidden  data  recovery  with 
probability  of  error  close  to  what  may  be  attained  with 
known  embedding  signatures  and  known  original  host  au¬ 
tocorrelation  matrix. 

The  rest  of  the  paper  is  organized  as  follows.  In  Section 
2  we  present  the  signal  model  for  the  multi-signature  SS 
embedding  procedure  and  formulate  the  problem  of  active 
SS  steganalysis.  After  developing  the  hidden  data  extraction 
algorithms  in  Section  3,  experimental  studies  are  presented 
in  Section  4.  Finally,  some  concluding  remarks  are  drawn  in 
Section  5. 

The  following  notation  is  used  throughout  the  paper.  Bold¬ 
face  lower-case  letters  indicate  column  vectors  and  boldface 
upper-case  letters  indicate  matrices;  R  denotes  the  set  of  all 
real  numbers;  ()T  is  the  transpose  operator;  II  is  the  L  x  L 
identity  matrix;  sgn{-}  denotes  zero-threshold  quantization 
and  E{-}  represents  statistical  expectation.  Finally,  |  •  |,  ||  •  ||, 
and  ||  •  ||f  are  the  scalar  magnitude,  vector  norm,  and  matrix 
Frobenius  norm,  respectively. 

2.  MULTI-SIGNATURE  SS  EMBEDDING 
AND  STEGANALYSIS  PROBLEM 
FORMULATION 

Consider  a  host  image  H  £  A4  Al x  N'2  where  M.  is  the  fi¬ 
nite  image  alphabet  and  N\  x  N2  is  the  image  size  in  pixels. 
Without  loss  of  generality,  the  image  H  is  partitioned  into 
M  local  non-overlapping  blocks  of  size  N\f2  .  Each  block, 
Hi,  H2, ....,  H m,  is  to  carry  K  hidden  information  bits  com¬ 
ing  -potentially-  from  K  distinct  messages.  Embedding  is 
performed  in  a  2-D  transform  domain  T  (such  as  the  discrete 
cosine  transform,  a  wavelet  transform,  etc.).  After  trans- 


(b) 

Figure  1:  (a)  Baboon  image  example  H  £ 

{0, 1, ...,  255}256 x 256  .  (b)  Host  data  autocorrelation 

matrix  (8x8  DCT,  63-bin  host). 

form  calculation  and  vectorization  (for  example  by  conven- 

v,  iV, 

tional  zig-zag  scanning),  we  obtain  T(Hm)  £  R  m  , m  = 
1,2 ,...  ,M.  From  the  transform  domain  vectors  T (Hm)  we 
choose  a  fixed  subset  of  L  <  N\f2  coefficients  (bins)  to 
form  the  final  host  vectors  x(m)  £  RL,  m  =  1,2, . . . ,  M. 
It  is  common  and  appropriate  to  avoid  the  dc  coefficient  (if 
applicable)  due  to  high  perceptual  sensitivity  in  changes  of 
the  dc  value. 

The  autocorrelation  matrix  of  the  host  data  x  is  an  impor¬ 
tant  statistical  quantity  for  our  developments  and  is  defined 
as  Rx  =  E{xxt}  =  jy  J2m= 1  x(m)x(m)T.  It  is  easy  to 
verify  that  in  general  Rx  ^  aln,a  >  0;  that  is,  Rx  is  not 
const  ant- value  diagonal  or  “white”  in  field  language.  For 
example,  8x8  DCT  with  63-bin  host  data  formation  (ex¬ 
cluding  only  the  dc  coefficient)  for  the  256  x  256  gray-scale 
Baboon  image  in  Fig.  1(a)  gives  the  host  autocorrelation 
matrix  Rx  in  Fig.  1(6). 

2.1  Multi-signature  SS  Embedding 

The  K  distinct  message  bit  sequences  {6fc(m)}™=i,  k  = 
1,2, ...  ,K,  bk(m)  £  {±1},  are  hidden  in  the  transform- 
domain  host  vectors  {x(m)}0f=1  via  additive  SS  embed¬ 
ding  by  means  of  K  spreading  sequences  (signatures)  Sk  £ 


RM|sfc||  =  1,  k  =  1,2,..., K, 

K 

y (m)  =  ^2  Akbk(m)sk+x(m)  +  n(m),  m  =  1,  2, ,  M,  (1) 

fe=i 

with  corresponding  amplitudes  Ak  >  0,  fc  =  1  for 

the  sake  of  generality,  n(m)  ~  A/”(0,  cr2Iz,)  represents  poten¬ 
tial  external  white  Gaussian  noise2  with  variance  cr2.  It  is 
assumed  that  bk{m)  behave  as  equi- probable  binary  random 
variables  that  are  independent  in  time,  m  =  1, ...,  M,  and 
across  messages,  k  =  1, ...,  K.  The  contribution  of  each  in¬ 
dividual  embedded  message  bit  bk  to  the  composite  signal 
is  AkbkSk  and  the  mean-squared  distortion  to  the  original 
host  data  x  due  to  the  embedded  k  message  alone  is 

Vk=E{\\Akskbk\\2}  =  Al  k  =  l,2,...,K.  (2) 

Under  statistical  independence  of  messages,  the  mean-squared 
distortion  of  the  original  image  due  to  the  total,  multi¬ 
message,  insertion  is  V  —  Ak. 

The  intended  recipient  of  the  fcth  message  can  perform  hid¬ 
den  bit  detection  by  looking  at  the  sign  of  the  output  of  the 
minimum-mean-square-error  (MMSE)  filter  wmmse  ,k'- 

bk(m)  =  sgn{wMMSBifcy(m)}  =  sgn{sfc  R“V(m)}  (3) 

where  Ry  is  the  autocorrelation  matrix  of  the  stego  vectors 

{y  (m)}£=1 

K 

Ry  -  E{yyT}  =  Rx  +  ^  Alsksl  +  allL.  (4) 

k= 1 

The  autocorrelation  matrix  Ry  can  be  estimated  by  sam¬ 
ple  averaging  over  the  finite  set  of  M  stego  data,  Ry  = 
J2m=i  y(m)y(m)T.  Using  Ry  in  (3),  we  obtain  what  is 
known  as  the  sample-matrix-inversion  MMSE  (SMI-MMSE) 
detector  implementation  [28]. 

2.2  Formulation  of  Active  Steganalysis 
Problem 

We  assume  that  the  active  extraction  steganalyst  has  the 
ability  to  obtain  transform  domain  stego  data  in  the  form 
of  y(m)  in  (1)  after  performing  appropriate  image  parti¬ 
tion,  transform,  and  coefficient  selection3  on  the  image  clas¬ 
sified  as  stego  by  passive  steganalysis.  We  denote  the  com¬ 
bined  “disturbance”  to  the  hidden  data  (host  plus  noise)  by 
z(m)  =  x(m)  +  n(m).  Then,  SS  embedding  by  (1)  can  be 
rewritten  as 

K 

y(m)  =  ^2  Akbk(m)sk  +  z(m),  m  =  1, . . . ,  M,  (5) 
k= 1 

where  z(m)  is  modeled  as  a  sequence  of  zero-mean  (without 
loss  of  generality)  vectors  with  autocovariance  matrix  Rz  = 
E{zzt}  =  Rx  +  cr2 1.  Let  Vfc  =  Ak sk  £  RL,  k  =  1, . . . ,  K,  be 

2  Additive  white  Gaussian  noise  is  frequently  viewed  as  a 
suitable  model  for  quantization  errors,  channel  transmission 
disturbances,  and/or  image  processing  attacks. 

3  Host  image  partition  may  be  estimated  by  examining  the 

difference  between  neighboring  pixels  [14].  For  each  investi¬ 
gated  transform,  all  coefficients  (except  the  dc  value)  may 
be  considered. 


amplitude-including  embedding  signatures.  Then,  we  can 
further  rewrite  SS  embedding  as 

K 

y(m)  =  ^2  bk{m)vk  +  z(m)  (6) 

k= 1 

=  Vb(m)+z(m),  m  =  l,...,M,  (7) 

where  V  =  [vi, . . . ,  vk]  £  RLxK  is  the  amplitude-including 
signature  matrix  and  b(m)  £  {±1}  xl  is  the  vector  of  bits 
embedded  in  the  mth  host  block.  For  notational  simplicity, 
we  can  write  the  whole  stego  image  data  as  one  matrix 

Y  =  VB  +  Z  (8) 

where  Y  =  ]y(l)y(2)...y(M)]  £  RLxM ,  B  =  ]b(l)b(2)...b(M)]  £ 
{±l}KxM,  and  Z  =  ]z(l)  z(2)  ...  z (M)\  £  RixM. 

Our  objective  is  to  blindly  extract  the  unknown  hidden  data 
B  from  the  stego  data  Y  without  prior  knowledge  of  the 
embedding  signatures  sk,  and  amplitudes  Ak,  k  =  1, . . . ,  K, 
in  V  =  [Aisi, . . . ,  AfcSfc]  or  the  host  itself  x(l), . . .  ,x(M)  in 
Z  =  [x(l)  +  n(l),...,x(M)  +  n(M)]. 

3.  ACTIVE  STEGANALYSIS  FOR  HIDDEN 
DATA  EXTRACTION 

If  Z  were  to  be  modeled  as  Gaussian  distributed,  the  joint 
maximum-likelihood  (ML)  estimator  of  V  and  detector  of 
B  would  be 

V,  B  =  arg  min  ||RT4(Y  -  VB)||2  (9) 

B6{±1}<KxM>’ 
veK1  x  K 

where  multiplication  by  Rz  2  can  be  interpreted  as  prewhiten¬ 
ing  of  the  compound  observation  data.  If  Gaussianity  of  Z 
is  not  to  be  invoked,  then  (9)  is  simply  referred  to  as  the 
joint  generalized  least-squares  (GLS)  solution4  of  V  and  B. 

3.1  Multi-signature  Iterative  Generalized 
Least-Squares  Procedure 

The  global  GLS-optimal  message  matrix  B  in  (9)  can  be 
computed  independently  of  V  by  exhaustive  search  over  all 

_  i 

possible  choices  under  the  criterion  function  ||RZ  2  YRb||f, 

B  =  arg  min  !|rz2yrb||f  (io) 

BG{±1}^xm" 

where  Rb  —  I  —  BT(BBT)^1B.  Exhaustive  search  has,  of 
course,  complexity  exponential  in  KM  (total  size  of  hidden 
messages  in  bits).  We  consider  this  cost  unacceptable  and 
attempt  to  reach  a  quality  approximation  of  the  solution 
of  (10)  (or  (9),  to  that  respect)  by  alternating  generalized 
least-squares  estimates  of  V  and  B,  iteratively,  as  described 
below. 

Pretend  B  is  known;  the  generalized  least-squares  estimate 

4  Generalized- least  squares  solutions  are  weighted  least- 
squares  (WLS)  solutions  with  optimal  weighting  matrices, 

here  Rz  2 ,  that  yield  the  lowest  variance  of  the  estimation 
error  [31], [34], 


of  V  is 

VGLS  =  arg  min  ||R“^  (Y  -  VB)\\2F 
valxK 

=  YBt(BBt)_1.  (11) 

Pretend,  in  turn,  that  V  is  known;  then,  the  least-squares 
estimate  of  B  over  the  real  field  is 

Bq£s  =  arg  min  ||Rz^(Y-VB)||p 

B£R  KxM 

=  (VtR’1V)~1VtR~1Y.  (12) 

Observing  that 

(VtRz'1V)“1VtR71  =  (VTRy  1V)_1VtR“  (13) 

we  rewrite 

BGls  =  (VtR;1V)-1VtR^1Y  (14) 

and  suggest  the  approximate  binary  message  solution 

Bgls”7  =  arg  min  ||RZ  2  (Y  -  VB)\\2F 
B6{±1)kx" 

~  sgn{(VTRy  1V)_1VtR“  1Y}.  (15) 

The  proofs  of  (11),  (12),  and  (13)  are  provided  in  the  ap¬ 
pendix. 

The  multi- signature  iterative  generalized  least-squares  (M- 
IGLS)  procedure  suggested  by  the  two  equations  (11)  and 
(15)  is  now  straightforward.  Initialize  B  arbitrarily  and  al¬ 
ternate  iteratively  between  (11)  and  (15)  to  obtain  at  each 
step  conditionally  generalized  least  squares  estimates  of  one 
matrix  parameter  given  the  other.  Stop  when  convergence 
is  observed.  Notice  that  (15)  requires  knowledge  of  the  au¬ 
tocorrelation  matrix  of  the  stego  data  Ry  which  can  be 
estimated  by  sample  averaging  over  the  received  data  ob¬ 
servations,  Ry  =  Yhm= i  y(m)y(m)T-  The  M-IGLS  SS 
steganalysis  algorithm  is  summarized  in  Table  1.  Super¬ 
scripts  denote  iteration  index.  For  the  sake  of  mathematical 
accuracy,  we  emphasize  that  there  is  always  a  sign/phase 
ambiguity  present  when  one  considers  joint  data  extraction 
and  signature  identification.  The  sign  ambiguity  problem 
can  be  overcome  with  a  few  known  or  guessed  data  symbols 
for  sign  correction. 

3.2  Cross-Correlation  Enhanced  M-IGLS 

We  understand  that,  with  arbitrary  initialization,  conver¬ 
gence  of  the  M-IGLS  procedure  described  in  Table  1  to  the 
optimal  GLS  solution  of  (9)  is  not  guaranteed  in  general. 
Extensive  experimentation  with  the  algorithm  in  Table  1 
indicates  that,  for  sufficiently  long  messages  hidden  by  each 
signature  ( M  =  4Kbits  or  more,  for  example),  satisfac¬ 
tory  quality  message  decisions  B  can  be  obtained.  How¬ 
ever,  when  the  message  size  is  small,  M-IGLS  may  very  well 
converge/return  wrong  solutions.  The  quality  (generalized- 
least-squares  fit)  of  the  end  convergence  point  depends  heav¬ 
ily  on  the  initialization  point  and  arbitrary  initialization  - 
which  at  first  sight  is  unavoidable  for  blind  steganalysis-  of¬ 
fers  little  assurance  that  the  iterative  scheme  will  lead  us  to 
appropriate,  “reliable”  (close  to  minimal  generalized  least- 
squares  fit)  solutions.  Re-initialization  and  re-execution  of 
the  M-IGLS  procedure  is  always  possible  but  the  challenge 


Table  1:  Iterative  generalized  least-squares  SS  ste¬ 
ganalysis 


1)  d  :=  0;  initialize  B(0^  €  {±l}AxM  arbitrarily. 

2)  d  ■.=  d+  1; 

V(d)  :=  Y(B(d_1))T  [(B(d"1))(B(d“1))T’j  _1. 


B(d)  :=  si 


sign 


"Ry^V^))  l(V(d))TRyl 


3)  Repeat  Step  2  until  B(d)  =  B(d-1). 


is  how  to  assess  whether  solutions  returned  by  the  M-IGLS 
procedure  are  reliable  or  not  without  any  side  information. 
The  rest  of  this  section  is  devoted  to  addressing  this  chal¬ 
lenge. 

Since  B  and  V  are  jointly  detected  and  estimated,  corre¬ 
spondingly,  if  one  is  not  reliable  neither  is  the  other  in  gen¬ 
eral.  We  first  examine  the  reliability  of  the  bit  matrix  deci¬ 
sion  B  =  [bi, . . . ,  b k]T  returned  by  the  M-IGLS  procedure 
of  Table  1.  The  sample  cross-correlation  between  any  two 
bit  streams  is 

Vi, j  =  bfb j/M,  i  j,  i,  j  =  1, . . . ,  K.  (16) 

Formally,  the  true  information  bits  are  independent  within 
user  streams  and  across  users.  If  rgj  were  to  be  viewed 
as  approximately  normally  distributed  with  zero  mean  and 
variance  -L-,  then  the  probability  of  \Vi,j\,  i  ^  j ,  being  larger 
than,  say,  the  threshold  value  -^=  is  very  low  at  about  0.3% 
(we  can  calculate  Pr(|?7jj|  >  -^— )  ~  0.003).  Motivated  by 
this  calculation,  we  introduce  below  Criterion  1  that  classi¬ 
fies  convergence  points  of  the  M-IGLS  procedure  in  Table  1 
as  “compliant”  or  not  based  on  the  sample  statistics  of  the 
returned  data  matrix  B. 

Criterion  1:  If  for  all  i  ^  j  €  {1,  2, . . . ,  K }, 

then  (B,  V)  returned  by  the  M-IGLS  procedure  in  Table  1 
are  classified  as  “  Criterion- 1- compliant."  ■ 

Criterion  1  provides  the  means  for  coarse  identification  of 
unreliable  solutions.  An  unreliable  convergence  point  would 
then  trigger  re-initialization  and  re-execution  of  the  M-IGLS 
procedure  in  Table  1  until  a  Criterion- 1-compliant  point  is 
obtained.  To  enhance  the  end  accuracy  of  blind  hidden  data 
extraction,  we  propose  one  additional  criterion  based  on  the 
returned  estimated  signature  matrix  V.  We  will  motivate 
our  proposal  by  examining  experimentally  the  normalized 
cross-correlation  between  the  estimated  signatures  v*,  re¬ 
turned  by  the  Criterion- 1-equipped  M-IGLS  procedure  and 
the  true  signatures  Vfc,  k  =  1  We  consider  as  a 

host  example  the  gray  scale  256  x  256  “Baboon”  image  of 
Fig.  1(a)  and  perform  8x8  block  DCT  embedding  by  (1) 
over  all  bins  except  the  dc  coefficient  with  K  =  4  distinct 
arbitrary  signatures  Sk  €  R63  and  per-message  distortion 
T>k  =  31.5dB,  k  =  1, ...  ,4.  For  the  sake  of  generality,  we 
also  incorporate  white  Gaussian  noise  of  variance  cr^  =  3dB. 
We  run  the  Criterion- 1-equipped  M-IGLS  procedure  400 
times.  The  histogram  of  the  normalized  cross-correlation 

T 

values  9k  —  u^'k«uk  n  of  the  four  hundred  returned  solutions 
for  message  k  =  1  in  Fig.  2  (representative  of  all  other  mes¬ 
sages)  reveals  that  Criterion  1  is  not  by  itself  sufficient  to 


Figure  2:  Histogram  of  normalized  cross-correlation 
between  vi  and  Vi  (256  x  256  Baboon  image,  8x8 
DCT,  L  =  63,  K  =  4,  Vk  =  31.5dB,  k  =  1, . . . ,  4,  = 

3dB;  vi  returned  by  Table  1  M-IGLS  steganalysis 
procedure). 


eliminate  erroneous  solutions.  Yet,  there  exists  a  tight  clus¬ 
ter/region  formed  by  210  or  so  of  the  Criterion- 1-equipped 
M-IGLS  convergence  points  around  the  true  embedding  sig¬ 
nature. 

The  basic  idea  now  behind  our  second  and  final  refinement 
of  the  M-IGLS  blind  hidden  data  extraction  procedure  is  to 
identify  and  average  these  reliable  clustered  estimates.  Of 
course,  identification  of  the  reliable  estimates  is  not  a  trivial 
task  due  to  our  complete  lack  of  knowledge  of  vk  (or  s k), 
k  =  1, . . . ,  K .  In  this  context,  assume  that  we  have  P  esti¬ 
mates  of  Vk  denoted  by  vjf\  k  =  1, . . . ,  K,  j  =  1, . . . ,  P, 
obtained  by  P  runs  of  the  Criterion- 1-equipped  M-IGLS 
procedure.  From  the  example  of  Fig.  2,  we  understand 
that  reliable  estimates  of  v*,  have  high  normalized  cross¬ 
correlation  (close  to  1)  with  each  other,  while  they  will  have 
low  normalized  cross-correlation  with  other  unreliable  es¬ 
timates  of  Vk-  In  contrast,  unreliable  estimates  will  tend 
to  have  low  normalized  cross-correlation  with  each  other. 
Therefore,  the  reliability  of  vj^  may  be  quantified/assessed 
by  examining  the  sum-cross-correlation  with  the  other  vjf\ 

t  +  j  €  {1,  •  ■  -,P}, 


Pk 


U)  A 


=  E 


Iv^l 


(17) 


A  reasonable  threshold  value  for  binary  reliability  classifica¬ 
tion  may  be  the  average  value 


Pk  ~  pE^fc’  k  = 


(18) 


3= 1 


utilized  in  the  proposed  Criterion  2  below. 


^•(7) 

Criterion  2:  Let  v\ .  be  the  estimates  of  Vk  returned  by 
P  arbitrary  initializations  of  the  Criterion- 1-equipped  M- 
IGLS  procedure  of  Table  1,  k  m  1, . . . ,  K,  j  =  1, . . . ,  P.  If 
Pk^  >  P/c,  then  v(kj)  is  considered  a  reliable  estimate  of  the 
Vk',  otherwise  we  declare  it  as  unreliable.  ■ 


Table  2:  Cross-correlation  Enhanced  M-IGLS 


For  j  :=  1  to  P 

1)  Execute  M-IGLS  of  Table  1  with  arbitrary 
initialization  and  obtain  estimates  v*,,  k  =  1, . . . ,  K. 

2)  If  estimates  are  Criterion- 1-compliant, 

vjf0  :=  vk,  k  =  1, .. .  ,K; 
else  go  to  1). 

End 

For  k  :=  1  to  K 

3)  Identify  reliable  estimates  for  Vk  according 
to  Criterion  2. 

4)  Calculate  the  average  over  all  reliable  estimates  vk 
by  (19). 

End 


5)  Set  V  =  [vi, . . . ,  vk]. 

6)  Execute  M-IGLS  of  Table  1  with  initialization 

B(0)  =sgn(('vTRy1VN) 


Finally,  we  average  our  reliable  (according  to  Criterion  2) 
estimates  of  the  effective  signatures  Vk  to  produce  one  last 
high-quality  initialization  of  the  M-IGLS  algorithm  of  Table 
1.  Let  Sk  denote  the  set  of  all  reliable  estimates  of  Vk  ac¬ 
cording  to  Criterion  2  and  let  |<Sfc|  denote  the  cardinality  of 

Sk-  Our  averaged  estimate  of  matrix  V  is  now  given  by  V 
with 

V  =  [vi, . . . ,  vj  where  vk  =  ^  v[j)  ,  k  =  1 . . . ,  K, 

1  fc| jesk 

(19) 

i.e.  Vk  is  the  average  over  all  reliable  estimates  of  vk  accord¬ 
ing  to  Criterion  2.  We  execute  M-IGLS  in  Table  1  a  final 

time  initialized  at  B(0^  =  sgn  R~1V^  V  RJ71Y 

We  call  M-IGLS  with  both  Criteria  1  and  2  incorporated, 
Cross-Correlation  enhanced  M-IGLS  (CC-M-IGLS)  and  sum¬ 
marize  the  complete  procedure  in  Table  2. 

4.  EXPERIMENTAL  STUDIES 

A  technically  firm  and  keen  measure  of  quality  of  an  active 
steganalysis  solution  is  the  difference  in  the  bit-error-rate 
(BER)  experienced  by  the  intended  recipient  and  the  ste- 
ganalyst.  The  intended  recipient  in  our  studies  may  be 
using  any  of  the  following  three  message  recovery  meth¬ 
ods:  (*)  Standard  signature  matched-filtering  (MF)  with 
the  known  signatures  sk,  k  =  1  ( ii )  sample-matrix- 

inversion  MMSE  (SMI-MMSE)  filtering  with  known  signa¬ 
tures  s k  and  estimated  host  autocorrelation  matrix  Ry  (see 
(3));  (Hi)  ideal  MMSE  filtering  with  known  signatures  sk 
and  known  true  host  autocorrelation  matrix  Rx  which  serves 
as  the  ultimate  performance  bound  reference  for  all  meth¬ 
ods.  In  terms  of  blind  active  steganalysis  (neither  sk  nor  Rx 
known),  we  will  examine  (iv)  the  developed  M-IGLS  algo¬ 
rithm  in  Table  1  alone  and  (v)  CC-M-IGLS  of  Table  2  with 
P  =  20  Criterion-1  runs.  Finally,  the  performance  of  two 
typical  ICA-based  blind  signal  separation  (BSS)  algorithms, 
(vi)  FastICA  [35],  and  (vii)  JADE  [36],  will  also  be  included 
in  the  studies  for  comparison  purposes. 


Figure  3:  Average  BER  versus  per-message  distor¬ 
tion  (512  x  512  Baboon,  L  =  63,  K  =  4  messages  of 
4Kbits  each,  =  3dB). 


Figure  4:  Average  BER  versus  per-message  distor¬ 
tion  (256  x  256  Baboon,  L  =  63,  K  =  4  messages  of 
1Kbit  each,  —  3dB). 

We  first  consider  as  a  host  example  the  gray-scale  512  x  512 
“Baboon”  image.  We  perform  8x8  block  DCT  embedding 
by  (1)  over  all  bins  except  the  dc  coefficient  with  K  =  4 
distinct  arbitrary  signatures  s*,  £  R63,  k  =  1  The 

hidden  message  embedded  by  each  signature  is  =  4,  096 
bits  long.  The  per-message  mean  square  distortion  due 
to  each  embedded  message  is  set  to  be  the  same  for  all 
messages,  i.e.  T> *.  =  A|  =  k  =  1,...,4.  For  the 
sake  of  generality,  we  also  incorporate  white  Gaussian  noise 
of  variance  a ^  =  3dB.  Fig.  3  shows  the  average  BER 
(over  all  K  =  4  messages)  of  all  methods  (i)  through  ( vii ) 
listed  above  as  a  function  of  the  host  distortion  per  mes¬ 
sage.  While  the  independent/principal-component  methods 
(FastICA  and  JADE)  are  failing  to  carry  out  effective  ac¬ 
tive  SS  image  steganalysis,  to  our  satisfaction  CC-M-IGLS 
SS  steganalysis  is  rather  close  in  BER  performance  to  the 
ideal  MMSE  detector  bound  where  both  the  embedding  sig¬ 
natures  and  the  clean  host  autocorrelation  matrix  Rx  are 
perfectly  known.  It  could  be  argued  that  for  this  host  and 


Figure  5:  512  x  512  gray-scale  Boat  image. 

rather  large  size  of  M  =  4, 096  bits  per  message,  CC-M- 
IGLS  offers  a  moderate  gain  only  in  comparison  with  M- 
IGLS  of  Table  1  by  itself. 


In  Fig.  4,  however,  we  repeat  the  exact  same  experimental 
study  on  the  smaller  256  x  256  version  of  the  Baboon  image 
Fig.  1(a)  with  K  =  4  hidden  messages  of  length  only  ^|§-  = 
1,  024  bits  per  message.  CC-M-IGLS  now  provides  dramatic 
performance  improvement  over  M-IGLS  which  surely  justi¬ 
fies  the  extra  computational  cost  and  extraction  delay.  At 
the  same  time,  comparing  with  Fig.  3,  the  gap  between  CC- 
M-IGLS  and  ideal  MMSE  increases  as  the  hidden  message 
size  (use  of  signature,  individually)  decreases. 

For  additional  experimental  validation,  the  studies  of  Fig.  3 
and  Fig.  4  are  repeated  on  the  familiar  “Boat”  image  (shown 
in  Fig.  5)  in  its  512  x  512  and  256  x  256  gray-scale  versions 
(Fig.  6  and  Fig.  7,  correspondingly).  Identical  conclusions 
are  drawn  regarding  the  effectiveness  of  CC-M-IGLS  blind 
active  steganalysis. 

Finally,  to  examine  the  behavior  of  CC-M-IGLS  under 
increased-density  small- message  hiding,  we  consider  the  256  x 
256  gray-scale  “F-16  Aircraft”  image  (shown  in  Fig.  8)  with 
K  —  4  or  K  =  8  hidden  messages  of  length  1Kbit  each. 
Recovery  performance  plots  are  given  in  Fig.  9  and  Fig.  10, 
correspondingly.  An  encompassing  conclusion  over  all  exe¬ 
cuted  experiments  is  that  CC-M-IGLS  remains  a  most  ef¬ 
fective  technique  to  extract  blindly  hidden  messages,  while 
extraction  becomes  more  challenging  as  the  length  of  hid¬ 
den  messages  (use  of  an  embedding  signature)  decreases  or 
the  number  of  hidden  messages  (number  of  used  signatures) 
increases. 


5.  CONCLUSIONS 

In  this  paper  we  considered  the  problem  of  active  blind 
spread-spectrum  steganalysis  and  attempted  to  recover  un¬ 
known  messages  hidden  in  image  hosts  via  multi-signature 
spread-spectrum  embedding.  Neither  the  original  host  nor 
the  embedding  signatures  are  assumed  available.  We  first 


Figure  6:  Average  BER  versus  per-message  distor¬ 
tion  (512x512  Boat,  L  =  63,  I\  =  4  messages  of  4Kbits 
each,  =  3dB). 


Figure  7:  Average  BER  versus  per-message  distor¬ 
tion  (256  x  256  Boat,  L  =  63,  K  =  4  messages  of  1Kbit 
each,  <r^  =  3dB). 


Figure  8:  256  x  256  gray-scale  Aircraft  image. 


Figure  9:  Average  BER  versus  per-message  distor¬ 
tion  (256  x  256  Aircraft,  L  =  63,  K  =  4  messages  of 
1Kbit  each,  =  3dB). 


Figure  10:  Average  BER  versus  per-message  distor¬ 
tion  (256  x  256  Aircraft,  L  =  63,  K  =  8  messages  of 
1Kbit  each,  a\  =  3dB). 


developed  a  low  complexity  multi-signature  iterative  gener¬ 
alized  least-squares  (M-IGLS)  core  algorithm.  Cross-correlation 
enhanced  M-IGLS  (CC- M-IGLS),  a  procedure  based  on  sta¬ 
tistical  analysis  of  repeated  independent  M-IGLS  processing 
of  the  host,  offers  most  effective  blind  hidden  message  recov¬ 
ery.  In  fact,  experimental  studies  showed  that  CC-M-IGLS 
can  achieve  probability  of  error  rather  close  to  what  may 
be  attained  with  known  embedding  signatures  and  known 
original  host  autocorrelation  matrix  and  present  ifself  as  an 
efficient  countermeasure  to  conventional5  SS  steganography. 


5  In  [26],  Bas  and  Cayre  present  an  interesting  signature- 
based  additive  embedding  approach  different  to  (1)  that  is 
host-vector-by-host-vector  dependent  and  would  withstand 
IGLS-based  active  stegnalysis.  The  embedding  is,  however, 
very  sensitive  to  noise  that  would  lead  to  high  recovery  error 
rates  by  intended  recipients  and  limit  the  applicability  to 
general  covert  communication  problems. 
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APPENDIX 
Proof  of  (11) 

The  GLS  cost  function  in  (9)  can  be  rewritten  as 

J  =  ||Rz_iY-R~4VBj||  (20) 

=  tr|R“1YYTj  —  tr  |r/1YBtVt|  — 

tr  |r/1VBYt|  +  tr  |r/1VBBtVt|  (21) 

where  tr{-}  denotes  the  trace  of  a  matrix. 

For  a  given  message  matrix  B,  the  GLS  optimal  estimate  of 
V  can  be  obtain  by  differentiating  the  cost  function  J  with 
respect  to  VT  and  setting  the  outcome  equal  to  the  zero 
matrix, 

r)  T 

gyr  =  -Rz_1YBt  +  R/1V(BBt)  =  0,  (22) 

=>  V  =  YBt(BBt)-\  (23) 


Pretend  that  V  is  known  and  relax  the  domain  of  the  symbol 
information  matrix  to  the  real  space,  B  £  RKxM.  The  GLS 
optimal  estimate  of  B  £  RKxM  can  be  calculated  again  by 
differentiation 

^  =  — VtR/1Y  +  VtR/1VB  =  0,  (25) 

=>  B  =  (VtR/1V)_1VtR/1Y.  (26) 


Proof  of  (13) 

Since  Ry  =  E{yyT}  =  VVT  +  Rz ,  by  the  Matrix  Inversion 
Lemma  (also  known  as  Woodbury’s  Identity  [37]),  we  can 
obtain 

Ry  1  =  R-1  -  R-1V(I  +  VtRz-1V)-1VtRz-1.  (27) 
Then, 

VTRy1V  =  VTR“1V- 

VtRJ1V(I  +  VtR/1V)_1VtR/1V 
=  VTR/1V[I  -  (I  +  VTR“1V)“1VTR“1V] 
=  VtR71V(I  +  VtR71V)"1 
[(I  +  VtR71V)  -  VtRz-1V] 

=  VtR“1V(I  +  VtR“1V)“1.  (28) 

By  the  property  of  the  inverse  of  a  product  of  matrices  [37], 
(VtR]T1V)"1  =  (I  +  VtR"1V)(VtR"1V)-1 

=  (VtR^1V)_1+I.  (29) 

We  combine  the  results  of  (27)  and  (29)  and  finally  obtain 

(VtR/1V)'1VtR/1  =  ((V^R^V)-1  +  i)  VT 

(r/1  -  R/'V (I  +  VtRz-1V)-1VtR/1)  (30) 
=  (VtR71V)-1VtRz-1.  (31) 


Proof  of  (12) 

We  manipulate  the  GLS  cost  function  in  the  form  of  (21)  to 
write 

J  =  tr  |r/"1YYt|  —  tr  |  VtRz_1YBt|  — 

tr  |R71VBYT|  +  tr  |  VTR/1VBBT|  .  (24) 


